Add spec for the data section string literals feature #76139

jjonescz · 2024-11-28T11:33:51Z

Implementation: #76036
Link to rendered markdown: https://github.com/jjonescz/roslyn/blob/DataSectionStringLiterals-spec/docs/features/string-literals-data-section.md

docs/features/string-literals-data-section.md

AlekseyTs · 2024-12-13T19:03:05Z

@jjonescz Consider updating PR's title. The "feature flag" is just part of the feature, it is not the feature.

docs/features/string-literals-data-section.md

AlekseyTs · 2024-12-13T19:12:36Z

Done with review pass (commit 10)

AlekseyTs

LGTM (commit 11)

jcouv

LGTM Thanks (iteration 11)

teo-tsirpanis · 2024-12-26T23:05:54Z

Would be interesting to extend this to encoding constant strings being passed to Convert.FromBase64String, as raw binary data.

The protobuf compiler for example generates such strings. And BTW if I replace string.Concat with +s the compiler gets stuck.

AlekseyTs · 2024-12-26T23:13:19Z

@teo-tsirpanis

Would be interesting to extend this to encoding constant strings being passed to Convert.FromBase64String, as raw binary data.

Could you please elaborate what would be an advantage for doing this? Also, how does this align with the specific goal that we are trying to achieve here, which is not a goal of supporting different encodings for string literals?

docs/features/string-literals-data-section.md

jkotas · 2025-02-04T18:02:58Z

docs/features/string-literals-data-section.md

+
+For every unique string literal, a unique internal static class is generated which:
+- has name composed of `<S>` followed by a hex-encoded XXH128 hash of the string
+  (collisions [should not happen][xxh128] with XXH128 and so they aren't currently detected or reported, the behavior in that case is undefined),


Is this a security bug that the collision is undetected?

The collisions can happen with some very small but non-zero probability. If a program is unlucky to hit the collision, the string literals will be messed up and the behavior of the program will be impacted in interesting ways. This can be used by adversaries to contribute an innocently looking string to an open-source project where the string is carefully crafted to cause a collision with some other critical string.

There are a few places in the compiler where we use hashes and collisions could lead to program behavior changes. These are part of our security reviews in the past and this will be included in our next one. Thus far nothing has risen to the level of forcing extra checking. There is an increase in non-cryptographic hashing support though and I will make sure that is included in the next security review. @RikkiGibson

Right, XXH128 is non-cryptographic hash, so one should assume that it is easy to find collisions. If we are going to keep the current implementation for this feature, I would be curious to know how we are justifying that it is ok.

Understand the concern here. But you can create similar collisions today at much lower level places in the compiler by creating collisions with file contents. The use of non-crypto hashes was approved at that layer too (was in fact suggested by security team). The general reasoning is that once you can control code being submitted to the project you have lots of avenues for subverting expectations.

The general reasoning is that once you can control code being submitted to the project you have lots of avenues for subverting expectations.

In the open-source world, it is important that the project maintainers are able to review the changes with confidence.

For example, let's say that this feature gets enabled for some part of OpenTelemetry project and somebody submits a PR that includes a code like this:

void DiagnosticThatIsTurnedOffByDefault() { // Random unique string to help us identify blah DoStuff("ruroiodffdshofps09rw403492049358302849052peldfjskdfshgfdshiofofjsekljhkdkjafhkds3"); }

How can the maintainer of the OpenTelemetry project tell that this change is actually planting a backdoor into an unrelated code due to a hash collision?

xxHash128 is also used to identify source documents in the IDE currently, as well as to identify interceptable calls in source code. We consider it collision-safe for these purposes. See also the Examples at the bottom of this doc: https://web.archive.org/web/20240223124645/https://fossies.org/linux/xxHash/tests/collisions/README.md

@CyrusNajmabadi in case you have further thoughts.

xxHash128 is also used to identify source documents in the IDE currently

I would not be worried about the IDE case from the security angle. If somebody crafts a file name with a hash collision and it makes the IDE to misbehave or crash, it is not a big deal. (You may be scratching your head for a while if you are tasked with investigating the crash report.)

FWIW, I think it should be straightforward to add collision detection, it just didn't seem straightforward to test that, but I can look into that more.

Actually I think hash collisions might not be a problem - #77061 (comment)

docs/features/string-literals-data-section.md

jkotas · 2025-02-04T18:15:34Z

docs/features/string-literals-data-section.md

+
+### Runtime support
+
+We could emit the strings to data fields similarly but we would not synthesize the `string` fields and static constructors.


We should make this happen if you plan to advertise this feature for broader use, and not just as a workaround for the few folks who run into the limit.

The plan is to use the current state of implementation, compiler only trick, to get feedback on the scenarios. We did a number of benchmarks on the work and didn't see meaningful differences that justified preemptively doing runtime work. Decided that we would ship it out as experimental, get real world feedback from teams like Bing, and then let that drive what we wanted the final state of the feature to be before we remove experimental lable

@davidwrighton, @DamianEdwards

docs/features/string-literals-data-section.md

jjonescz added 3 commits November 28, 2024 10:29

Copy feature spec

9058e0f

Add "literal" to the feature flag

5b4c80d

Add more details

ec93409

ghost added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 28, 2024

jjonescz added Documentation Area-Compilers and removed Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 28, 2024

jjonescz mentioned this pull request Nov 28, 2024

Emit opted-in string literals into data section as UTF8 #76036

Merged

jjonescz added 2 commits November 28, 2024 12:36

Convert indentation to spaces

5aeecdb

Rename feature flag

991b785

jjonescz requested review from a team, AlekseyTs and jaredpar November 28, 2024 12:07

jjonescz changed the title ~~Add spec for the utf8-string-literal-encoding feature flag~~ Add spec for the data section string literals feature flag Nov 28, 2024

jcouv reviewed Nov 30, 2024

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

jcouv reviewed Dec 1, 2024

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

jcouv self-assigned this Dec 1, 2024

jaredpar reviewed Dec 2, 2024

View reviewed changes

jjonescz added the Feature - String Literals in Data Section as UTF8 label Dec 3, 2024

jjonescz mentioned this pull request Dec 3, 2024

Test plan for "string literals in data section as utf8" #76234

Closed

Improve

eaefe41

jjonescz requested a review from jcouv December 4, 2024 20:26

AlekseyTs reviewed Dec 4, 2024

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

AlekseyTs reviewed Dec 4, 2024

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

AlekseyTs reviewed Dec 4, 2024

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

AlekseyTs reviewed Dec 4, 2024

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

AlekseyTs reviewed Dec 4, 2024

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

AlekseyTs reviewed Dec 4, 2024

View reviewed changes

docs/features/string-literals-data-section.md Show resolved Hide resolved

jjonescz requested a review from AlekseyTs December 13, 2024 12:35

AlekseyTs reviewed Dec 13, 2024

View reviewed changes

docs/features/string-literals-data-section.md Show resolved Hide resolved

jjonescz changed the title ~~Add spec for the data section string literals feature flag~~ Add spec for the data section string literals feature Dec 14, 2024

Make the shared helper private

51f9b02

AlekseyTs approved these changes Dec 16, 2024

View reviewed changes

jcouv approved these changes Dec 17, 2024

View reviewed changes

jjonescz added 2 commits December 27, 2024 16:54

Do not detect XXH128 collisions

eb7b374

Clarify feature flag values

6f8d5bc

cston reviewed Jan 14, 2025

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

cston reviewed Jan 14, 2025

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

cston reviewed Jan 14, 2025

View reviewed changes

docs/features/string-literals-data-section.md Outdated Show resolved Hide resolved

jjonescz added 2 commits January 14, 2025 10:18

Improve wording

8b011c5

Remove note

1af62e9

cston approved these changes Jan 14, 2025

View reviewed changes

jjonescz merged commit f617688 into dotnet:main Jan 15, 2025
5 checks passed

jjonescz deleted the DataSectionStringLiterals-spec branch January 15, 2025 16:59

dotnet-policy-service bot added this to the Next milestone Jan 15, 2025

This was referenced Jan 22, 2025

[Automated] PRs inserted in VS build main-35721.127 #76844

Closed

[Automated] PRs inserted in VS build feature.debugger.shadowDebug-35722.156 #76874

Closed

[Automated] PRs inserted in VS build feature.debugger.main-35722.139 #76881

Closed

dibarbet modified the milestones: Next, 17.14 P1 Jan 28, 2025

dotnet-bot mentioned this pull request Jan 30, 2025

[Automated] PRs inserted in VS build feature.d18initial-10229.00 #76967

Closed

jkotas reviewed Feb 4, 2025

View reviewed changes

This was referenced Feb 5, 2025

Update data section string literals spec #77050

Merged

Detect data section string literal hash collisions #77061

Merged


		### Runtime support

		We could emit the strings to data fields similarly but we would not synthesize the `string` fields and static constructors.

Add spec for the data section string literals feature #76139

Add spec for the data section string literals feature #76139

Uh oh!

Conversation

jjonescz commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlekseyTs commented Dec 13, 2024

Uh oh!

Uh oh!

AlekseyTs commented Dec 13, 2024

Uh oh!

AlekseyTs left a comment

Choose a reason for hiding this comment

Uh oh!

jcouv left a comment

Choose a reason for hiding this comment

Uh oh!

teo-tsirpanis commented Dec 26, 2024

Uh oh!

AlekseyTs commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjonescz Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

jjonescz commented Nov 28, 2024 •

edited

Loading

AlekseyTs commented Dec 26, 2024 •

edited

Loading

jjonescz Feb 5, 2025 •

edited

Loading